Static and Dynamic Big Data Partitioning on Apache Spark
نویسندگان
چکیده
Many of today’s large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this paper we study how a specific data partitioning strategy affects the performances of graph algorithms executing on Apache Spark. To this end, we implemented different graph algorithms and we compared their performances using a naive partitioning solution against more elaborate strategies, both static and dynamic.
منابع مشابه
The comparison effects of eight weeks spark and frenkel exercises on static and dynamic balance in the blinds
Introduction: One of the most important human senses is vision, which its loss is causing many primary and secondary complications for physical and psychological health such as difficulties in static and dynamic balance. This study aimed to compare the effect of 8 weeks of Spark and Frenkel exercises training on the static and dynamic balance in blind people. ...
متن کاملThe STARK Framework for Spatio-Temporal Data Analytics on Spark
Big Data sets can contain all types of information: from server log files to tracking information of mobile users with their location at a point in time. Apache Spark has been widely accepted for Big Data analytics because of its very fast processing model. However, Spark has no native support for spatial or spatio-temporal data. Spatial filters or joins using, e.g., a contains predicate are no...
متن کاملEfficient spatio-temporal event processing with STARK
For Big Data processing, Apache Spark has been widely accepted. However, when dealing with events or any other spatio-temporal data sets, Spark becomes very inefficient as it does not include any spatial or temporal data types and operators. In this paper we demonstrate our STARK project that adds the required data types and operators, such as spatio-temporal filter and join with various predic...
متن کاملDynamic Multi-Objective Optimization with jMetal and Spark: A Case Study
Technologies for Big Data and Data Science are receiving increasing research interest nowadays. This paper introduces the prototyping architecture of a tool aimed to solve Big Data Optimization problems. Our tool combines the jMetal framework for multi-objective optimization with Apache Spark, a technology that is gaining momentum. In particular, we make use of the streaming facilities of Spark...
متن کاملAn Adaptive Partitioning Scheme for Ad-hoc and Time-varying Database Analytics
Data partitioning significantly improves query performance in distributed database systems. A large number of techniques have been proposed to efficiently partition a dataset, often focusing on finding the best partitioning for a particular query workload. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload. F...
متن کامل